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Abstract. We introduce a simplified protein model where the water degrees of 
freedom appear explicitly (although in an extremely simplified fashion). Using this 
model we are able to recover both the warm and the cold protein denaturation within 
a single framework, while addressing important issues about the structure of model 
proteins. 
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1. Introduction 

Proteins are extremely complex structures: they are long heteropoly- 
mers made of up to 20 different amino-acids species, each of them with 
its own chemical, electrostatic and steric properties; the physiological 
solvent, an aqueous solution, and its characteristics play a fundamental 
role both in the dynamics and in the thermodynamics of folding. It is 
therefore not surprising that only in recent times statistical physicists 
have begun working on this problem, mainly after the introduction of 
the so-called HP model |l| , where the above mentioned richness has been 
reduced to a manageable level. In the HP model, proteins are modeled 
as self-avoiding polymers on a lattice (two or three dimensional), greatly 
reducing the number of accessible conformations The chemical and 
electrostatic properties of amino-acids have also been simplified: in- 
deed, it has been recognized since long that the main force stabilizing 
the native conformations of globular proteins is the hydrophobicity of 
non-polar amino-acids |^]. Consequently, the important properties of 
amino-acids are reduced to two: they are either polar (ions or dipoles, 
labeled with P) or non-polar (H). 

Hydrophobicity can be described as the tendency of hydrophobic 
molecules to reduce as much as possible their surface of contact with 
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water: two hydrophobic molecules try to stick together in order to hide 
from water their mutual surface of contact. Consequently, hydropho- 
bicity has been introduced in the HP model as an effective attractive 
interaction between H amino-acids. Then, the solvent degrees of free- 
dom can be neglected. Here we show that such a simplification can be 
removed, and water can be taken into account, keeping the complex- 
ity of the model at a still manageable level: the benefits are a better 
description of the protein phenomenology (namely, cold destabilization 
and eventually denaturation|^, |5|) and some insights on the structure 
of the protein core. 

In the last fifteen years there has been a growing body of evidence 
for the so called cold destabilisation of proteins: the free energy differ- 
ence AF^ between denaturate and native conformations of proteins 
has parabolic shape, with a maximum at temperatures of the order 
of 15 — 25*^0, or lower, implying that at lower temperatures the na- 
tive conformation is less and less stable. In some cases, even the cold 
denaturation of proteins has been obtained 

There are at least two reasons to believe that a good description of 
cold destabilisation and denaturation is relevant to protein folding. 

In order to describe protein folding with a simple model, it is impor- 
tant to capture the essential physics of the process, at the temperatures 
at which it takes place. If the stability of native conformations of pro- 
teins begins to decrease below 15 — 25°C, it is unlikely, at least a priori, 
that the physics responsible for such a behavior is not important around 
the maximal stability temperature, in a range relevant for in vivo pro- 
tein folding. A further reason to believe that a good model for protein 
folding should also agree with the cold destabilisation phenomenology 
is that, actually, there is no clear-cut distinction between the physics 
that stabilizes proteins, and the one that destabilizes them. In both 
cases a re-analysis of the concept of hydrophobicity and of hydrophobic 
hydration is necessary. 

Already Frank and Evans identified the origin of hydrophobicity 
in the partial ordering of water around non-polar molecules (such as, 
for example, pentane, benzene and some amino-acids). Water molecules 
tend to build ice- like cages around non-polar molecules. Although a 
detailed analysis of these structures is, to our knowledge, still lack- 
ing (actually recently some better understanding and consensus are 



emerging]^, |^ |l^, 11]), we can guess their energetic and entropic prop- 



erties. Indeed, water molecules forming these cages are highly hydrogen 
bonded, much as in ice; consequently, their formation is energetically 
favorable with respect to bulk liquid water. Yet, the possible molecular 
arrangements in the cages are a small number compared to all the 
disordered molecular conformations typical of liquid water. The latter 
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are energetically unfavorable with respect to bulk water because water 
molecules fail to form hydrogen bonds with hydrophobic amino-acids. 
Therefore the free energy of formation of a cage {Fcage — Fno cage = AF) 
is a balance between an enthalpy gain/loss and an entropy loss/gain: or- 
dered cages give an enthalpy gain {AH < 0) and an entropy loss (AS* < 
0); the scenario is the opposite for disordered states. All of the above 
arguments call for a model able to reproduce (at least qualitatively) 
such a rich phenomenology. 



2. The HP-Water Model 

The model we propose here borrows two of the simplifications from the 
HP model: proteins are still modeled as heteropolimers on a lattice, 
made of just two different amino-acid species: polar (P) and non-polar 
(H). Every site of the lattice that is not occupied by the polymer is 
occupied by water (in general, by a group of water molecules that can 
be arranged in q states). 

2.1. Two-State Models for Water 

Water is described using the Muller-Lee-Graziano (MLG) two-states 
model (Figure ||a) ^J. The energy of each H amino-acid depends 
on the states of the water molecules it is in contact with (the wa- 
ter molecules in the hydration shell). As mentioned above, hydration 
water can build ordered cages around the molecule, that are energet- 
ically favorable with respect to the possible disordered configurations, 
hence Eds > Eqs- Yet, the disordered configurations outnumber the 
ordered ones: q^s > Qos- Water molecules that are not in contact with 
non-polar molecules {bulk water) are described by a two-state model 
as well: water molecules can build networks of hydrogen bonds that 
are energetically favorable with respect to disordered configurations 
{Edb > Eoh) even if there are far more disordered configurations than 
ordered ones {qdb > Qoh)- The above arguments hold separately for bulk 
and shell molecules. In order to understand the order of the energies 
and of the degeneracies we need to describe the effects of the transfer 
of water molecules from the bulk to the hydration shell. Indeed, hy- 
drogen bonds between hydration shell water molecules on the average 
are stronger than hydrogen bonds in the bulk {E^b > Eqs); conversely, 
the number of hydrogen bonded configurations in the hydration shell 
is smaller than in the bulk {q^b > qos)- Actually, the two inequalities 
are mutually consistent: the greater the number of equivalent config- 
urations, the higher the probability to switch from one to the other; 
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therefore, the persistence time of the bulk hydrogen bonds is shorter 
than the persistence time in the hydration sheh, whence the energy 
inequaUty. The ordered bulk orientations that do not contribute to 
ordered shell configurations can be counted among the disordered shell 
configurations, that are therefore more abundant than the disordered 
bulk configurations. Moreover, such configurations are energetically less 
favorable than bulk disordered configurations because every time a 
water molecule points one of its bonding directions toward the non- 
polar molecule, it loses energy: therefore Eds > Edb and qds > Qdb- Such 
a hand- waving picture has been recently confirmed by Silverstein et al., 
who derived the double two-state MLG model using a molecular model 
of the water-amino acid system [p]]. 

The simple MLG model is extremely effective in describing the ther- 
modynamics of solvation of hydrocarbons (that, as recalled above, are 
strongly hydrophobic: indeed the residues of hydrophobic amino-acids 
are essentially hydrocarbons, e.g., the residue of leucine is an isopropyl 
group) 1^]. The degeneracies and energy differences can be fitted to 
experiments, in order to get the detailed values. The MLG model has 
six free parameters (one degeneracy and one energy can be chosen 
as reference): too many for a simple theoretical model. We therefore 
introduce the Bimodal model (BM): as a simplifying assumption (see 
Figure |l|b), we say that out of the q possible states of a water molecule, 
one can be singled out to be a cage conformations (labeled s = 0), 
energetically favorable with energy —J (J > 0), and the remaining 
q — 1 (s = 1, q — 1) states are energetically unfavorable with energy 
K > (they represent the disordered states of reduced hydrogen-bond 
coordination). We stress that the term (un) favorable is always with 
respect to bulk liquid water. Bulk water molecules that are not in 
contact with H amino-acids do not contribute to the energy. Such a 
model is much simpler than the MLG one, yet it bears qualitatively 
similar results, as we shall show. 

2.2. The Model 

As we mentioned above, we model proteins as polymers on a lattice. 
Monomers can be either hydrophobic or polar. Sites that are not occu- 
pied by any amino-acid, are occupied by water. The energy of the model 
is given by the energy of the water sites. Every water site is occupied 
by some water molecules, so that its available energy states should be 
given by a suitable convolution of several MLG or BM models, as given 
in the previous section. As a simplifying assumption, we describe the 
energetics of a water site using a two-state model, choosing the bulk 
or shell parametrisation depending on whether the site is in contact 
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with a H monomer or not. P amino- acids do not interact with water 
so that their energy is always 0: such a crude approximation is made 
with the idea that hydrophobicity is the leading effect stabilizing the 
native conformation of proteins. Some better description of the water-P 
interaction would be welcome, but such ingredient is unnecessary for 
our present purposes. 

Indeed, in the original formulation of the HP model, only inter- 
actions between hydrophobic amino-acids where considered. Although 
this is clearly a strong assumption, it has the advantage to keep the 
model as simple as possible and to clearly address the effect of the sole 
hydrophobicity. On the other hand, the various approximations of the 
model imply that the questions we can answer are somehow limited. 
In this paper we look only at the thermodynamic behavior of proteins, 
and at the simplest of the structural features, that is the segregation 
of hydrophobic residues in the core of the protein. Other important 
problems can be addressed already within the HP-model scheme, such 
as the relation between sequences and structures, and in particular the 
designability of the latter. We will tackle such issues in future works. 

In what follows we will make explicit use of the BM model: formulas 
for the MLG model are similar and can be easily derived. Given a 
protein of N amino-acids, with the sequence ai, 02, cat (cj = P or 
H), the energy of the protein is then 

E= {-JSs„o + K{l-5s^,o)) (1) 

<j,H> 

where the sum is over the water sites that are nearest neighbors of some 
H amino-acid. Starting from (|l]) we can write the partition function of 
the system as 

Zn = Y.Zn{C) (2) 

c 

where Z]\}{C) is the partition function associated to a single conforma- 
tion C: 

Z^{C) = q'^o(^) {{q - l)e-^^ + e^y^^""^ (3) 

and the dependence on the water degrees of freedom has been explicitly 
calculated. ni{C) is the number of water sites nearest neighbors of some 
H amino-acid, no is the number of water sites in the bulk or in contact 
only with P amino-acids. We also keep the description of water as 
simple as possible, neglecting any interactions between different water 
sites. 

We deal with model proteins of length up to = 17 on the square 
lattice, and compute the partition function, and all the thermodynamic 
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quantities and averages by exact enumeration of the 2155667 differ- 
ent conformations. We show the results for the particular sequence 
PHPPHPPHPHPPHPPHH. Such a sequence has been chosen to 
have a compact state with all the hydrophobic amino-acids in the core. 
That such a state is the native one (most stable and unique) is what 
we must check with the statistical mechanical treatment. We choose 
J = 1 (actually, both K and the temperature T can be normalized 
with respect to J), K = 2 and q = 10^ (a better determination of these 
values could come from molecular dynamics and structural studies). 
We take the Boltzmann constant ks = 1- 



3. Results 

In Figure |2| the specific heat C^, and the average number of monomer- 
monomer contacts, ric, are shown. The low-temperature peak in the 
specific heat coincides with a jump of ric- at lower temperatures the 
protein is swollen, and maximizes the number of water-H contacts, 
in agreement with cold denaturation. The number of contacts, ric, 
begins decreasing coinciding with the high-temperature peak of the 
specific heat, that therefore coincides with the usual warm denaturation 
phenomenon. 

Between Tc and there is a region where the most probable confor- 
mation is the one represented in the inset of Figure ^: as it can be seen, 
it is compact with a hydrophobic core, out of reach for water (we also 
checked that this native state is unique, in that its Boltzmann weight 
is the largest above Tc). We have analyzed the behavior of different 
protein lengths and of different sequences, and we have always found the 
same qualitative behavior of and ric- Our model is therefore able to 
describe, within a single framework, both cold and warm denaturation. 
Moreover, it shows a native state with a mostly hydrophobic core. 

Although the ratio between T^/Tc ~ 10 in Figure |2| is unphysical 
(from experiments T^/Tc ~ 1.5), using the full MLG model it is pos- 
sible to come closer to real values: in Figure H the ratio is I ^ 
3, and going toward more and more refined models of water and of 
water /amino- acids interactions it is surely possible to get physical val- 
ues of the ratio. Of course the price to be paid is the larger number of 
parameters to adjust. In this work we address the physical principles re- 
sponsible for the thermodynamic behavior of proteins on a broad range 
of temperatures: we believe that the differences between the bimodal 
model and the MLG model (and other possible more refined models) 
govern the details of the behavior more than the essential features. 
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We next compare the free energy, enthalpy and entropy variations of 
folding of our model with those from the literature |5|. Indeed, such a 
comparison is a difficult one, since it is hard to define what a denatured 
state is in our theoretical calculations, and even the experiments have 
not yet been conclusive on such issue. Therefore, as a simple approxi- 
mation, we consider as denaturate those conformations with at most 4 
monomer-monomer contacts (a polymer of 17 monomers over a square 
lattice has at most 9 monomer- monomer contacts). The native state 
has 8 monomer-monomer contacts. 

In Figure I we show FDenaturate- FNative = ^-^JV ' TAS§. 

They coincide qualitatively with the ones from experiments [Q, |5|. We 
point out the presence of two temperatures below and above which 
AF^ < 0: the denatured state of our model protein is more stable than 
the native state. Between these two temperatures, instead, AF^ > 0, 
and the native state is the most stable. In the same temperature range 
where AF^ > 0, AH^ and TAS^ have a strong temperature depen- 
dence: they even change sign, a signature of the rich physics behind the 
water-protein system. At high temperatures we find that both AH^ 
and AS^ saturate {TAS grows linearly, therefore AS saturates), as 
experimentally observed|5[. Some particular care should be paid to the 
low temperature behavior of AH^ and TAS^. Indeed, AH^ goes to a 
constant value, which is consistent with a lower bound for the energies, 
and TAS^ tends to with T. Experiments should be made below Tc 
to assess such a behavior (although a recent model suggests such sce- 
nario [|lO|). We find therefore that our model reproduces qualitatively 
the known calorimetric data of protein denaturation over a broad range 
of temperatures. 



4. Effective Interaction 
4.1. Two-Body Interactions 

The hydrophobic effect is often modeled through attractive effective 
HH interactions. Within our framework (and on a square lattice, for 
simplicity), we consider a system of two H amino-acids in solution 
and we compare the partition function of the system when the two 
amino-acids are in contact, 

Zc = q\e'^-^ + {q-l)e-f^^f , (4) 

with the one when the distance between the two amino-acids is infinite, 

Zo = ie^' + {q- l)e-n'- (5) 
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The effective interaction is defined as 

(euH is positive if attractive, with this definition). The T — >• cx3 hmit is 

€hh{T ^^) = 1K-\k^J) (7) 

and is attractive for large values of q: it is the usual hydrophobic 
effective interaction. Yet, the T = limit is euH = — 2J, repulsive. 
A meaningful effective interaction should at least include such a tem- 
perature dependence. Actually, the strong temperature dependence of 
€hh is not the only limitation to a definition of an attractive HH 
interaction. Indeed, such an interaction can be meaningful only for 
amino- acids surrounded by water molecules, but it cannot be defined 
in the core of proteins, where water is absent. As a consequence, in the 
absence of some true interactions between amino-acids, the hydropho- 
bic interaction alone is not able to favor thermodynamically the native 
state against different compact states obtained by reordering only the 
core of the protein. As an example, the two conformations in Figure ^, 
corresponding to the sequence PPHPPPPPPPPPHHHP have the 
same probability to occur in our model, since they hide and expose to 
water the same number of H amino-acids. 



4.2. Many-Body Interactions 



When a protein is folding, its amino-acids find an ever changing en- 
vironment that depends on water and on the other amino-acids. Even 
the reliability of two-body effective interactions vs. many-body ones is 
an open issue still to be settled. In our model we can compute some 
many body effective interactions. First, we observe that next-nearest- 
neighbor interactions are equal to nearest-neighbor interactions, €nnn = 
en Hi since there are again six water sites in contact with the hy- 
drophobic molecules. Then we consider three H particles in an angle 
configurations (see Figure ^a). The effective energy can be computed 
as 

eHHH = 5T In ^^j ^ _ ^y^^K + ^e^/f + ^nnn ■ (8) 

Already this simple situation shows that integrating away the solvent 
degrees of freedom cannot be reduced to simple two-body effective 
interactions: the intrinsic three-body contribution to the free energy 
amounts to roughly 20% of the total. Many-body effects can be pro- 
nounced also if the corner amino-acid is polar ((see Figure ^). In that 
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case the effective energy is 

eHPH = 7T In ^^j ^ ^^^_pK + ^e/^P + e„„„ (9) 

with enp = Tln[(g/(e^'^ + — l)e~^^)]. Actually, the non additivity 
and more generally the context-dependence of water mediated effective 



interactions has also been recently pointed out [16|. 
4.3. Validity of Effective Interactions 

This model suggests that it is improper to define interactions of hy- 
drophobic origin inside proteins, and that the detailed structure of the 
cores of proteins should be stabilized by other mechanisms. Recently, 
many methods have been devised to derive effective potentials able to 
stabilize protein structures: some of them | |12| are of a statistical nature 
and have been shown to be intrinsically flawed ||l^; some other methods 



that have a more rigorous physical basis have also been proposed |14]. 
Still, no matter the physical soundness of the method, the presence 
of intrinsic many-body effective potentials casts a shadow over any 
simplistic two-body description of amino-acid/amino-acid interactions. 
Moreover, effective interactions can be safely defined whenever they 



substitute some non-changing environment [15|. It is therefore intrin- 
sically difficult to define effective potentials of some general valid- 
ity between amino-acids: our model points out such a problem for 
hydrophobic interactions. 



5. Conclusions 

In conclusion, we have introduced a model of proteins in water that is 
able to reproduce the known features of proteins (namely, cold destabil- 
isation and warm denaturation, a native state with a mostly hydropho- 
bic core, and the correct free energy, enthalpy and entropy of folding). 
We also checked our results for different protein lengths, sequences, 
parameter values and even implementing the full MLG model for the 
description of water. Although some details may change, the overall 
behavior is consistent and robust. Moreover, lattice models are intended 
to be only qualitatively instructive, whereas a quantitative description 
can be given only by more detailed off-lattice models. 

Our model is a first step in an interpolation between fully micro- 
scopic models, where water is dealt with at a molecular level, and fully 
effective models, where water is accounted for by effective potentials 
between amino-acids. As we have shown, the reliability of simple two- 
body, context-independent effective potentials is, at least as a matter 
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of principle, questionable (very recently, Park et al. proposed at least a 
distinction between surface and core two-body potentials ||l^): within 
our scheme such a problem emerges clearly. Although our model is an 
extremely simplified one, we believe that moving toward more realis- 
tic descriptions of the water /amino-acid system would complicate the 
structure of the effective potentials, rather than simplifying it, and that 
therefore a much better understanding is needed. 
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Figure 1. Bimodal effective models. Panel (a): MLG model, with bimodal energy dis- 
tributions both for bulk and shell water molecules. The lower levels represent ordered 
group of water molecules, the higher levels disordered ones. The order of energies 
and of degeneracies, as obtained from experiments, is Eds > Edb > Eob > Eos and 
Qds > <ldb > <lob > Cfos {ds = disordcrcd shell, os = ordered shell, db = disordered 
bulk, ob = ordered bulk). Panel (b): the simplified bimodal energy distribution, with 
just two free parameters, K and q, since we can take J as energy scale. 
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Figure 2. Specific heat, monomer-monoiiler contacts and number of water sites in 
an excited state for the protein shown in the inset; K = 2, J = 1, q = 1000. 
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Figure 3. Same as in Figure 2, for a full implementation of the MLG model, with 
Eos = -1.4, Eds = 1-8, -Bob = -1, Em = 1, So. = 1, qds = 999, qob = 50, qai, = 950. 




Figure 4- Free energy, enthalpy and entropy (times T) differences between denatured 
conformations and the native one (shown in the inset of Figure 2), for the same 
parameter values as in Fig.3. Since TAS grows linearly at high temperatures, AS 
saturates. 
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(a) 



(b) 



Figure 5. Two different conformations of the same sequence differing only for a 
reorganization of the core amino-acids. 





(a) 



(b) 



Figure 6. Angle configurations used to compute the three-body effective energies: 
a) three H particles and b) two H and one P particles. 
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